home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-01-07 | 14.9 KB | 122 lines | [TEXT/R*ch] |
-
- Documentation for UUENCODE/DECODE 5.15
-
- UU-encoding is a way to code a file which may contain any characters into a
- standard character set that can be reliably sent over diverse networks.
-
-
- THE CHARACTER ENCODING:
-
- The basic scheme is to break groups of 3 eight bit characters (24 bits) into 4
- six bit characters and then add 32 (a space) to each six bit character which
- maps it into the readily transmittable character. Another way of phrasing this
- is to say that the encoded 6 bit characters are mapped into the set:
- `!"#$%&'()*+,-./012356789:;<=>?@ABC...XYZ[\]^_
- for transmission over communications lines.
-
- As some transmission mechanisms compress or remove spaces, spaces are changed
- into back-quote characters (a 96). (A better scheme might be to use a bias of
- 33 so the space is not created, but this is not done.)
-
- Another newer less popular encoding method, called XX-encoding uses the set:
- +-01..89ABC...XYZabc...xyz
-
- In my opinion, XX-encoding is superior to UU-encoding because it uses more
- "normal" characters that are less likely to get corrupted. In fact several of
- the special characters in the UU set do not get thru an EBCDIC to ASCII
- translation correctly. Conversely, an advantage of the UU set is that it does
- not use lower case characters. Now-a-days both upper and lower case are sent
- with no problems; maybe in the communications dark ages, there was a problem
- with lower case.
-
- This "UU" encode/decode pair can handle either XX or UU encoding. The encode
- program defaults to creating a UU encoded file; but can be run with a "-x"
- option to create an XX encoding.
-
- The decode program defaults to autodetect. However the program can get confused
- by comment lines preceeding the actual encoded data. The decode mode can be
- forced to UU or XX with the "-u" or "-x" parameter.
-
- Another option is for the character mapping table to be inserted at the front of
- the file. The format for this is discussed later. The table parameters are
- detected and used by this decode program. (A table will override the "-x" or
- "-u" parameters.) The encode program can be run with a "-t" option which tells
- it to put the table into the encoded file.
-
- A third encode mapping is the one used by Brad Templeton's ABE program. This is
- not handled by these programs as the check and control information surrounding
- the actual encoded data is in a different form.
-
- From a theoritical view, this encoding is breaking down 24 bits modulo 64. Note
- that 64**3 is = 2**24. The result is 24 bits in for 32 bits out, a 33% size
- increase. Note that 85**5 > 2**32. Also note that there are 94 transmittable
- ASCII characters (from 0x21 thru 0x7e). Thus modulo 85 encoding (the atob
- encoder) transforms 32 bits to 5 ASCII chars or 40 bits for a 25% size increase.
-
- The trade off in the module 85 encoding is that many communications systems do
- not reliably transmit 85 ASCII characters. The tilda, carat, brackets, and
- sometimes upper or lower case frequently get corrupted.
-
- COMPOSING A LINE OF ENCODED CHARACTERS:
-
- A small number of eight bit characters are encoded into a single line and a
- count is put at the start of the line. (Most lines in an encoded file have 45
- encoded characters. When you look at a UU-encoded file note that most lines
- start with the letter "M". "M" is decimal 77 which, minus the 32 bias, is 45.)
-
- This encode program puts a check character at the end of each line. The check
- is the sum of all the encoded characters, before adding the mapping, modulo 64.
-
- Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it uses the
- sum of the original, not the encoded characters. This decode program accepts
- either form of line check character.
-
- In previous versions (4.13 and lower) the line check characters was generated by
- default by this encode program and was supressed with the "-L" option. One
- reason to supress them is if they will be decoded by one of the old Horton
- decoders. Most decoders either accept this form of check or simply stop looking
- after the line length is exhausted. My feelings are mixed about the line
- checksums because errors of this type essentially never occur.
-
- However with modern, error-free communications systems and with the CRC checks
- on the entire file (see below) I have made the default for uuencoding to have NO
- line level check characters effective version 4.21. The "-L" option on uuencode
- turns on generation of line checksums. If you have a really bad communications
- system and you want to isolate a problem, turn them on.
-
- Uudecode automatically checks for the presense line checksums, so the default
- for uudecode is to leave line level checks on; if there are some problems the
- "-L" option for uudecode turns them off. Sometimes there is junk at the end of
- the line which causes spurious line checksum errors.
-
- I have encountered various other ways that encoders end lines. One encoder put
- a "M" at both the start and end of the line. Another used a line count
- character. This decode program checks all of these. I would not be surprised
- if some encoder out there ends lines with astrological symbols. If you
- encounter some other wierd form of encoded file, let me know.
-
-
- PACKAGING THE LINES INTO FILES:
-
- The lines of encoded data can be preceded by comments and by network addressing
- information. The encoded data is directly preceded by a line containing:
-
- begin <file-mode> <file-name>
-
- This line is created by the encoding program. The decode program scans the file
- looking for "begin" in column 1.
-
- The final end of encoded data is an encoded line with zero encoded characters (a
- back-quote), followed by a line containing "end".
-
- For integrity checking, various encode programs insert checksums for the entire
- file. This decode tries to check for all known types of file checksums. This
- is discussed in more detail later.
-
- This encode program puts a header line, containing the section number and file
- name, in front of every section:
-
- "section <number> of uuencode of file <file name>"
-
- At the end of a section the encode program inserts a line containing checksum
- and file size information